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Abstract 

We consider the question of interactive communication, in which two remote parties perform 
a computation while their communication channel is (adversarially) noisy. We extend here the 
discussion into a more general and stronger class of noise, namely, we allow the channel to 
perform insertions and deletions of symbols. These types of errors may bring the parties “out 
of sync”, so that there is no consensus regarding the current round of the protocol. 

In this more general noise model, we obtain the first interactive coding scheme that has a 
constant rate and tolerates noise rates of up to 1/18 —e. To this end we develop a novel primitive 
we name edit distance tree code. The edit distance tree code is designed to replace the Hamming 
distance constraints in Schulman’s tree codes (STOC 93), with a stronger edit distance require¬ 
ment. However, the straightforward generalization of tree codes to edit distance does not seem 
to yield a primitive that suffices for communication in the presence of synchronization problems. 
Giving the “right” definition of edit distance tree codes is a main conceptual contribution of 
this work. 


1 Introduction 

In the setting of interactive communication two remote parties, Alice and Bob, wish to run a dis¬ 
tributed protocol utilizing a noisy communication channel. The study of this problem was initiated 
by the seminal work of Schulman [251EZ112H] who showed a coding scheme for interactive protocols 
in which the communication complexity of the resilient protocol is larger than the communication 
of the input (noiseless) protocol by only a constant factor. Schulman’s coding schemes tolerates 
random noise where each bit is flipped with a small probability, as well as some adversarial noise 
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where the only restriction on the noise is the amonnt of bits being flipped by the adversary. Sub¬ 
sequently, many works considered the question of interactive communication, obtaining coding 
schemes reaching optimality in terms of their computational efficiency da la a 011 US] , commu¬ 
nication efficiency [201 mm, and noise resilience, both in the standard setting piaisiiis], and 
in various other noise models and settings [MiiMiiniiiiiiziiiiiiiiisiiin]. 

The recent successes in developing the theory of interactive error-correcting codes brought the 
study of two-way interactive coding to nearly match what we know about good codes against 
adversarial noise in the one-way setting. 

So far, all works focused on either substitutions (where Eve can substitute a sent symbol with a 
different symbol from the alphabet) or erasures (where Eve can substitute a sent symbol with a T). 
In this work we extend the question of coding for interactive communication over noisy channels to 
a more general type of noise. Namely, we consider channels with insertions and deletions (indels). 
In the one-way setting, this corresponds to the insertion and deletion model, where Eve is allowed 
to completely remove transmitted symbols, or inject new symbols. Note that this model is stronger 
than the substitution model, since a substitution can always be implemented as a deletion followed 
by an insertion. Even in the one-way regime, this model is more difficult to analyze than the model 
with substitution errors. As an example, Schulman and Zuckerman [26| gave a polynomial-time 
encodable and decodable codes for insertion/deletion errors. Their code can tolerate around ^ 
fraction of insertion/deletion errors. This should be contrasted with efficient codes in the standard 
noise setting, e.g., [T9|, tolerating about j fraction of bit flips. 

The major additional challenge in dealing with indels in the interactive setting compared to 
the non-interactive indel model and the interactive substitutions model, is that we can no longer 
assume that Alice and Bob are synchronized: at a given time they may be at different stages of 
their sides of the protocol! Indeed, if Eve deletes Alice’s transmission to Bob and additionally 
injects a ‘spoofed’ reply from Bob back to Alice, then while Bob has received no message and 
assumes the protocol hasn’t advanced yet, Alice has received a spoofed reply, and proceeds to the 
next step of her protocol. Erom this point and on, unless the insertion/deletion is detected, the 
parties are unsynchronized, as they run different steps of the protocol. The challenge in dealing 
with this model is to design a protocol that manages to succeed even without knowing whether the 
two parties are synchronized. 

1.1 Modeling insertions and deletions 

Some care is required when dealing with insertion and deletion noise patterns, as certain choices 
make the model too strong or too weak. Eor example, consider the case where Alice and Bob send 
each other symbols in an alternating way. Then, even if a single deletion of a symbol is allowed the 
noise can cause the protocol to “hang”: Bob will be waiting for a symbol from Alice, while Alice 
will be waiting for Bob’s response. Clearly, such a model is too strong for our purpose, and we 
should restrict the allowed noise patterns to preserve the protocol’s liveliness. 

There are two main paradigms for distributed protocol in which parties are not fully-synchronized. 
The first is a message-driven paradigm, in which each party “sleeps” until the arrival of a new mes¬ 
sage that triggers it into performing some computation and sending a message to the other party. 
The second is clock-driven, where each party holds a clock: each clock tick, the party wakes up, 
checks the incoming messages queue, performs some computation, and sends a message to the other 
side. The issue here is that different parties may have mismatching or skewed clocks. Then, instead 
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of acting in an alternating manner, one party may wake up several times while the other party is 
still asleep. 

We emphasize that if the parties have matching clocks, then no insertions and deletions are 
possible—channel corruption in this case has either the effect of changing one symbol to another 
(as in a standard noisy channel), or causing a detectible corruption, i.e., an erasure. Both these 
types of noise are substantially weaker than insertions and deletions, and were already analyzed in 
previous work (e.g., [SHlIIKIIllin]). 

Our noise model, which we describe shortly, makes sense for both the above paradigms: it 
guarantees liveliness in a message-driven setting; for the clock-driven model, we can show that any 
such settings reduces to our model, that is, any resilient protocol in our model can be used to obtain 
a resilient protocol in the clock-driven setting. The skewness of the clocks in that case, is related 
to the noise-resilience of the protocol in our model. See the full version for the complete details. 

In this work we assume a message-driven setting, where the parties normally speak in alternating 
manner. Any corruption in our model must be a deletion which is followed by an insertion. We 
name each such tampering as an edit corruption. 

Definition 1.1 (Edit Corruption). An edit-corruption is a single deletion followed by a single 
insertion (whether the inserted symbol is aimed at the same or the opposite party as the deleted 
message). 

This gives rise to two types of attacks Eve can perform: (i) delete a symbol and replace it with 
a different symbol (insert a symbol at the same direction as the deleted symbol; a substitution 
attack); (ii) delete a symbol, and insert a spoofed reply from the other side (insert a symbol at the 
opposite direction of the deleted symbol). The second type of corruption has an effect of making 
the parties ‘out-of-sync’: one party advances one step in the protocol, while the other does not; 
see Eiguredl Note that a substitution has a cost a single corruption, i.e., it is counted as a single 
deletion followed by a single insertion. Also note that although an outside viewer can split Eve’s 
attack into pairs of deletion-and-insertion, the string that a certain party receives, from that party’s 
own view, suffers an arbitrary pattern of insertions/deletions. 



Eigure 1: An illustration of the two insertion/deletion attacks: (i) a deletion followed by an insertion in the 
same direction (a substitution); (ii) a deletion followed by an insertion to the opposite direction (an out-of- 
sync attack). The deleted transmission is marked with a cross, and the inserted transmission is marked with 
a bold arrow. The dashed arrow denotes a possible (non-interrupted) reply. 
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1.2 Our Results and Techniques 

Tree codes with edit distance. When only a single message is to be transmitted (i.e., in the one¬ 
way setting), codes that withstand insertions and deletions were first considered by Levenshtein [22] . 
In such codes, each two codewords must be far away in their edit distance, a notion of distance that 
captures the amount of insertions/deletions it takes to convert one codeword to another. The edit 
distance replaces the Hamming distance, which essentially counts the amount of bit flips required 
to turn one codeword to another. 

A key ingredient in interactive-communication schemes is the tree code |28] . a labeled tree such 
that the labels on each descending path in the tree can be seen as a codeword. The tree code is 
parametrized by a distance parameter a, and it holds that any two codewords whose paths diverge 
from the same node, are at least a-apart in their Hamming distance. Encoding a message via a 
tree code allows a party to eventually obtain the message sent by the other side, as long as not 
too many errors have happened. This in turn allows the parties to correct errors that previously 
occurred in the simulation, and revert the simulation back into a correct state [MIEI. In order to 
keep the communication overhead a constant factor when tree code encoding is used in interactive 
schemes, it is required that each label of the tree comes from an alphabet of constant size (that is, 
the size of the alphabet is independent of the tree’s depth and thus independent of the length of 
protocol to simulate). 

It is only natural to believe that we could obtain interactive-communication schemes that with¬ 
stand insertions/deletions by replacing tree codes with a stronger notion of codes, namely, edit 
distance tree codes. In edit distance tree codes, each two codewords (possibly of different lengths) 
which diverge at a certain point, are required to be far apart in their edit distance rather than their 
Hamming distance. Yet, since the parties are not synchronized, new difficulties arise. To give a 
simple example, assume Alice sends one of the two following encodings si = ABCAAABBB and 
S 2 = ABC ABC AAA, and assume Bob has receives the string ABCBBB. If Bob knew that Alice 
thinks she is in round 6 of the protocol, he would decode to S 2 ‘, if he knew that Alice thinks she is 
in round 9, he would decode to si. Alas, he does not know which is the case! 

To mitigate situations in which not being synchronized may hurt us, we require an even stronger 
property, namely, we wish that the suffixes (of arbitrary lengths) of any two overlapping codewords 
will have appropriately large edit distance (See Definitions 13.21 and 13.31 for the precise condition). 
This stronger property guarantees that two “branches” in the tree are far apart in their edit distance, 
even when they are shifted with respect to each other due to lack of synchronization possibly caused 
by previous indels. We can then show that as long as not too many indels occurred in the suflix of 
the received codeword, the tree-code succeeds to recover the entire sent message. Crucial in this 
approach is a notion of distance we call suffix distance (Definition 13.91) , that measures the amount 
of noise in a codewords’ suffix. This generalizes a distance measure by Franklin et al. mm (see 
also Braverman and Efremenko |8]) to the case where the received word may be misaligned with 
respect to the sent word, due to indels. 

Alas, while (Hamming distance) tree codes over a constant alphabet were shown to exist by 
Schulman [28|, it is not clear if such trees exist for edit distance, and if so, for which distance 
parameter a, as Schulman’s proof doesn’t carry over to the edit distance case. Our first result 
(Section [3]) shows the existence of edit distance tree codes, for any distance parameter a. 

Theorem 1.2 (Informal). For any a < 1 and any d, n E N there exists a d-ary edit distance tree 
code of depth n over a constant-size alphabet. 
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In section [3] we prove the existence of such trees and prove the above lemma. Similarly to |28] , 
the proof is inductive in the depth of the tree, but the inductive statement needs to be further 
strengthened in order for the induction to go through. 

As in the case of standard tree codes, finding an efficient construction for such trees remains 
an important open question. Building on the techniques of Gelles, Moitra and Sahai [HES] we 
give in Appendix |E] an efficient randomized construction of a relaxed notion for edit-distance tree 
codes, we call a potent edit distance tree codes (see Definition IE . 21 for the precise definition of potent 
edit-distance tree codes). These trees satisfy the edit-distance guarantee almost everywhere, and 
are good enough to replace the tree-code notion of Theorem 11.21 in most applications. 

Theorem 1.3 (Informal). For any a < 1 and any d, n € N there exists a construction of a d-ary 
potent edit distance tree code of length n over a constant-size alphabet. The construction is efficient 
and succeeds with overwhelming probability (in n). 

While in the rest of the paper (namely, for our coding scheme) we assume the edit-distance 
notion of Theorem ll.2l all our schemes work the same when the tree is replaced with a potent one, 
see Appendix |F] for further details. 

Interactive-communication schemes tolerating insertions/deletions. Equipped with edit 
distance tree codes, we show a protocol that solves the pointer jumping problem over a noisy chan¬ 
nel with insertions and deletions and exhibits linear communication complexity in the noiseless 
communication. Since the pointer jumping problem is complete for two-party interactive commu¬ 
nication, this implies a coding scheme that can simulate any protocol over a channel that may 
introduce insertion/deletions. Specifically, in Sections H] and [D] we show that for any e > 0 and any 
noiseless protocol tt and inputs x,y, there is a scheme that correctly simulates tt (that is, produces 
the transcript 'K{x,y) at both parties), withstands 1/18 — e fraction of edit-corruptions, and has a 
linear communication complexity with respect to the communication of the protocol tt. 

Theorem 1.4. For any e > 0, and for any binary (noiseless) protocol tt with communication 
CC{tt), there exists a noise-resilient coding scheme with communication 0^{CC{tt)) that succeeds 
in simulating tt as long as the adversarial edit-corruption rate is at most 1/18 — e. 

Our coding scheme and analysis follows ideas by Braverman and Rao [7j and by Braverman and 
Efremenko [8]— first focusing on channels with polynomial-size alphabet and then generalizing to 
channels with constant-size alphabet—however, the analysis in the light of insertions and deletions 
is more complicated and subtle. In particular, similar to [3, our analysis uses the notion of suffix 
distance for relating the effect of the noise to the progress of the simulation. 

We note again that due to out-of-sync attacks, it is possible that the parties’ belief of the 
“current” round of the protocol is different. In the worst case, while Alice reaches the end of the 
coding protocol (say, round N), it is possible that Bob has only reached round (1 — 2p)N, e.g., 
due to 2pN deletions in his received communication (p is the fraction of edit-corruptions in that 
instance). Therefore, if we wish to tolerate a p-fraction of edit-corruptions, it is imperative that the 
parties output the correct answer already at round (1 — 2p)N. Our coding scheme (Theorem II. 4p 
satisfies even this more strict requirement. 

Finally, in Appendix [G] we show that for a family of rigid protocols, in which we require both 
parties to output the correct value at round (1 — 2p)N, then p = 1/6 is an upper bound on the 
admissible edit-corruption rate. 
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Theorem 1.5 (Informal). If both parties are required to give output at round (1 — 2p)N, then no 
coding scheme of length N can tolerate an edit-corruption rate of p = 1/6. 

Closing the gap between the upper bound of 1/6, and the resilience 1/18 achieved by the scheme 
of Theorem oi is left for future work. 

2 Preliminaries 

For any finite set S, we denote by the case that x is uniformly distributed over S. All 

logarithms are taken to base 2. We denote the set {1,2,... ,n} by [n]. For a set S we denote 
S-" = Uo<i<n51*, and S* = Uj>oS*. Let s G be a string of length |s| = 1. For 1 < i < j < I, we 
use s[i] to denote the z-th symbol of s and s[i..j] to denote the string s[z] o s[z + 1] o • • • o s[j]. 

Definition 2.1 (Pointer Jumping Problem). Any communication protocol of T rounds where the 
parties alternately exchange bits can be reduced to the following pointer jumping problem PJP{T): 
Let T be a binary tree of depth T. Alice’s input A is a set of consistent edges leaving vertices at 
even depths. Bob’s input T is a set of consistent edges leaving odd-level vertices. A set of edges is 
consistent on a specified set of vertices, if it contains exactly one edge leaving every vertex in that 
specified set. Due to being consistent on all the nodes, X UY defines a unique root-to-leaf path. 
The parties’ goal is to output this unique path. Note that T alternating rounds of communication 
suffice to compute this path, assuming noiseless channels. 

Definition 2.2 (Protocols). An interactive protocol vr for a function f{x,y) is a distributed algo¬ 
rithm that dictates for each party, at every round, the next message (symbol) to send given the 
party’s input and the messages received so far. Each transmitted symbol is assumed to be out of 
a fixed alphabet S. The protocol runs for N rounds (also called the length of the protocol), after 
which the parties give output. An instance of the protocol, on inputs x, y is said to be correct if 
both parties output f{x,y) at the end of the protocol. 

In this work we focus on alternating protocols, where as long as there is no noise, the protocol 
runs for 2N rounds in which Alice and Bob send symbols alternately. In the presence of p-fraction of 
edit corruptions (i.e., at most 2pN insertion/deletion errors), it is possible that some party receives 
only N{1 — 2p) symbols throughout the protocol. We say that the protocol is correct in presence of 
p-fraction of edit corruption if both parties output f{x,y) by round N{1 — 2p) (according to their 
own round counting, which may differ from the count of the other party). 

Definition 2.3 (String Matching). We say that r = {ti,T 2 ) is a string matching between a sent 
message sm and a received message rm (denoted r : sm rm), if |ri| = \t2\, del{Ti) = sm, 
del{T2) = rm, and Ti[i] ps T2{i\ for alH = 1,..., |ri|. Here del is a function that deletes all the *’s 
in the string and two characters a and b satisfy a ~ 6 if a = 6 or one of a and 6 is *. We assume 
that * is a special symbol that does not appear in sm and rm. 

Definition 2.4 (Edit Distance). The edit distance of between sm € S* and rm € S* is defined as 
ED{sm, rm) = minT-.sm^rm sc{ti) + sc(t 2 ). Here sc(ri) is the number of *’s in ri. 

Fact 2.5 (Triangle Inequality). For any three strings x,y,z, ED{x,y) < ED{x,z) + ED{y,z). 

Fact 2.6. For any two strings x,y, ED{x,y) = |x| + \y\ — 2 ■ LCS{x,y). Here LCS{x,y) is the 
longest common substring of x and y. 
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Lemma 2.7. Let x € S™ he some given string, m > n and |S| > 4. For any constant a € [0,1], 
Pryj^Y.n [ED{x, y) < a ■ m]< 2 ^. 


Proof. By Fact 12.61 we know that ED{x, y) <a-m<^ LCS{x, y) > a)m, ^ want to upper 
bound the number of y € S"' such that LCS{x,y) > g^n just enumerate locations 

of the common substring. Then, Pvy.(-j:n[ED{x,y) < a ■ m] = Piy^^^n LCS{x,y) > 


E n 
j-1 


i+(l —Q:)m 


((7)isrvisr <2-.|s|- 


n+( 1 —01)771 


(1 —o)7Tl 
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< 
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3 Edit-distance tree code 

In this section we recall the notion of tree-codes [5H] and provide a novel primitive, namely, the 
edit-distance tree-code. 

Definition 3.1 (Prefix Code). A prefix code C : —>■ is a code such that C{x)\i\ only 

depends on x\l..i]. C can be also considered as a |Sj„|-ary tree of depth n with symbols written 
on edges of the tree using an alphabet size |Sout|- On this tree, each tree path from the root of 
length I corresponds to a string x € and the symbol written on the deepest edge of this path 
corresponds to (^(x))/]. 

Definition 3.2 (a-bad Lambda). We say that a prefix code C contains an a-bad lambda if when 
you consider this prefix code as a tree, there exist four tree nodes A,B,D,E such that B ^ D, 
B ^ E, B is D and E’s common ancestor, A is B’s ancestor or B itself, and ED{AD, BE) < 
a ■ max(|AD|, \BE\). Here AD and BE are strings of symbols along the tree path from A to D 
and the tree path from B to E. See Figure [2] for an illustration. 

Definition 3.3 (Edit-distance Tree Code). We say that a prefix code C: ^ ^out is a a-edit- 

distance tree code if C does not contain an a-bad lambda. 

Our main theorem in this section is the existence of edit-distance tree codes, 

Theorem 3.4. For any d > 2, n > 0 and 0 < a < 1, there exists an a-edit-distance tree code of 
depth n with alphabet size \'Pin\ = d and \Eout\ = (176 • 

Proof. We prove this theorem by induction on n. To this end we define a slightly stronger notion 
than a-bad lambda free, which we call “excellent”. Intuitively, if a tree-code C is excellent, then 
with a good probability, C will not cause a bad lambda in a tree-code that contains C as a subtree. 
This would allow us to construct lambda-free trees of length n building on lambda-free trees with 
a smaller depth as subtrees. 

Definition 3.5 (Potential Probability). For any prefix code C : —>■ and any i > 0, 

consider (7 as a tree and define a new (non-regular) tree C' by connecting a simple path of length i 
to the root of C, making the other end of this path the root of C. Label each edge along the new 
path with a symbol from Ficmt chosen uniformly and independently. 

The potential probability Pi{C) is defined as the probability that the new tree C has a a-bad 
lambda with A = the root of C and B = the root of C. See Figure [3] for an illustration. 

Definition 3.6 (Excellent). Eor a constant ci > 1, which we will fix shortly, we say that a prefix 
code C is excellent if Vi > 0, Pi{C) ■ {d ■ ci)® < 1 and C is a-bad-lambda-free. 
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In the proof, we are going to construct a set called which would contain only excellent d-ary 
prefix codes of depth n with alphabet size Since excellent is a stronger notion than a-bad 

lambda-free, if we can construct Sn and show that for any n > 0 it is non-empty, then we are 
done. In each Sn, all codes use the same and T,out- We have |Sj„| = d and = swhere the 

conditions we put on s thorough the following proof are given by: 

s = max —-- • {d ■ ci) i-“ -h 1, d , (—) , (ci • d • 4) i-“ } . 

1(1 -«) C2 J 

Here ci = 44, and C 2 = Therefore, taking s > satisfies all the above conditions. 

Let us now inductively construct Sn- For notation convenience, let Sq = {a single node}. For 
n > 0, let 
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and define Sn = {C € 5n| C is excellet}. From this definition, we directly have that every C € Sn 
is excellent, and we are only left to show that Sn is non-empty. Actually, we are going to prove the 
following claim by induction. 

Claim 3.7. For all n, > C 2 = -jj. 

Base case (n = 1): Consider the following set. 

S[ = {C I C is a d-ary prefix code of depth 1 and d different codewords} 

We want to show that S[ C 5i. For any C G S'}, it is clear that C does not have any a-bad lambda. 
Because in this depth 1 tree, one can only pick A = H to be the root, and D, E to be some different 
leaves. Then ED{AD, BE) = 2 and \AD\ = \BE\ = 1. So ED{AD, BE) > a • max(|AD|, \BE\). 
Now let’s consider the potential probability PpC). Suppose there exists an a-bad lambda in the 
tree after adding a path of length i. Then in this a-bad lambda, B is the original root, AB is the 
path added to the root and D and E are two different leaves of the tree. There are two cases to 
consider: 

1. i = \AB\ > a/(l — a): In this case, since \BE\ = 1, (|Ai4| — \BE\)/\AD\ = 1 — > a. 

So it is not possible that ED{AD, BE) < a ■ \AD\. Thus Pi{C) = 0. 

2. i = \AB\ < a/ (1—a): In this case, let IT be the event that one of the labels along the path AB 
equals to one of the d codewords of C. Clearly, when IT does not happen, ED{AD, BE) = 
\AD\ + \BE\ > a-\AD\. It is also immediate that Pr[IT] < We require s > (jz^-(d-ci)i^, 

and get that Pi{C) ■ (d • ci)* < Pr[IT] • (d • ci)* < • (d • ci) < 1. 

Therefore, we have proved that all the prefix codes in S'} are excellent and therefore in Si. Then 
because s > d?, we get > (1 — l/d)‘^ > 1/11 = C 2 . 
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Inductive step: Suppose we already know that for 1 < i < n, > C 2 , and let us prove that 


15 I 

1=^ > C 2 . We will use the following lemma, whose proof is quite straightforward and can be found 
|5n| 

in Appendix iBl 

Lemma 3.8. If for all 1 < i < n, > C 2 . Then for any x € where 0 < j < n and y € 

it holds that Pr^^^^[C'(?/)[l..j] = x] < s~T 
u 

In order to show that > C 2 , we can choose C randomly from Sn-, and show that Pr[C' is excellent] > C 2 . 

\^n \ 

To this end, consider the conditions of being excellent in turn: 

1. Let’s first consider Pr[Pj(C') • {d ■ ci)* > 1]. By Lemma 13.81 and Lemma 12.71 we get 

Ec^sJ«(C)l 
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Pr [ED{z o C{x),C{y)) < a ■ max(i + ni, n 2 )] 
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[ED{z o x,y) < a ■ max(z + ni, 77 - 2 )] 
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(ci • d • 4) * < ^ ^ • (ci • d • 4) * = ^(ci • d • 4)- 

J=2 


By Markov’s inequality, Pr[Pj(C') ■ (d ■ ci)* > 1] < ^ • i. Then X£o Pr[Pj(C) ■ (d-ci)* > 1] < 

T_ ■sr^oo 1 _ X ^ A — 1~C2 
36 Ai=0 4* 27 ^ 11 2 • 

2. Now let’s consider the event that C has an a-bad lambda with A = B = root. It is easy to 
see that this event is equivalent to Po(C') > 1. Therefore it is covered by the previous case. 

3. Now let’s consider the event that C has an a-bad lambda with A = root and B ^ root. By 
Lemma 13.81 and Definition 13.61 we have 


Pr\C has an a-bad lambda with A = root, B ^ root] 

00 - 00 .j 

< j: <r>. 4-. (d ■ cO- < E (—)•“ = A (- 

ni<n ni=l ni=l 


1 5 

3 ' 11 


Ti = - < _ = 


1 - C2 


4. Finally let’s consider the case that C has an a-bad lambda with A 7 ^ root. It is easy to see 
that this event never happens because C’s subtrees rooted at depth 1 in C are all excellent. 

Therefore by union bound, Pr[C' is excellent] is at least 

CO 

>1 — ^^Pr[Pj(C') • (d • Cl)* > 1] — Pr[C' has an a-bad lambda with A = root, B / root] 


i=0 

. , 1 - C2 1 - C2 

>1-7;-7;- = C2- 
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3.1 Decoding of edit-distance tree codes 

The decoding of a codeword rm G via an edit-distance tree-code C amounts to finding a mes¬ 
sage whose encoding minimizes the suffix distance to rm, i.e., DEC(rm) = argmin ^y<n SD{rm,C{m)), 
where the suffix distance SD{-, •) is defined as follows. 


Definition 3.9 (Suffix Distance). Given any two strings sm,rm G S*, the suffix distance between 

1 • on/' \ • hll sc(ri[i..|Tin)-|-sc(T2[i..|r2|l) 

sm and rm is SD{sm,rm) = mm.r:sm^-rm max)^| |Vi|-^Vi-sc(Ti[i..|ri|]) • 


The following lemma, which plays an important role in the analysis of the simulation protocols 
that we present in the next sections, shows that if a message sm is encoded by some a-edit-distance 
tree code and the received message rm satisfies SD{sm, rm) < then the receiver can recover the 
entire sent message correctly. 


Lemma 3.10. Let C: —)• he an a-edit-distance tree code, and let rm G (m can he 

different from n). There exists at most one sm G U(T;^C'(S”^)[l..z] such that SD{sm,rm) < 

The proof appears in Appendix [Bj 


4 A coding scheme with a polynomial alphabet size 

In this section, we show a protocol tt that solves PJP(T) in 0{T) rounds over channels with 
alphabet size poly(T), and is resilient to (a constant fraction of) insertion/deletion errors. Since 
the PJP(T) is complete for interactive communication, this implies that any binary protocol with 
T rounds can be simulated in 0{T) rounds over a channel with polynomially-large alphabet that 
corrupts at most a fraction 1/18 — e of the transmissions. While this protocol does not exhibit a 
constant rate, it contains all the main ideas for the constant-rate protocol of Theorem ll.41 and thus 
we focus on this simple variant first. Then, in Section [D] we discuss how to reduce the alphabet size 
and achieve a protocol with 0{T) communication complexity with the same resilience guarantees. 

Assume vr has 2N alternating rounds, that is, Alice and Bob send N symbols each, assuming 
there are no errors. We would like the protocol to resist a fraction of p edit-corruptions, that is, 
the protocol should succeed as long as there are at most 2pN insertion/deletion errors. Due our 
assumption that the adversary never causes the protocol to “get stuck”, this amounts to at most 
2pN deletions, where each deletion is followed by an insertion. 

We assume that Alice and Bob share some fixed a-edit-distance tree code C : given 

by Theorem 13.41 We will set the values of N , a, and later. Currently we only need to know 
N = poly{T). 

Let us begin with a high-level outline of the protocol vr. The protocol basically progresses by 
sending edges in the tree T of the underlying PJP{T), interactively constructing the joint path 
(similar to [7]). To communicate an edge e, the parties encode it as a pair of numbers (n, s), where 
0 < n < and s G {1, 2, 3,4}. The value n indicates the number of some previous round in which 
some edge e! was sent, and the value s determines e as the s-grandchild of e!. That is, we always 
send an edge e by linking it to an edge e' that was previously sent, such that e' is located two levels 
above in unique path leading from the root to e. If e does not have a grandparent (e.g., it is the 
at the first or second level in T), we will set n = 0. Sometimes the parties have no edge to send, 
in which case they set n = and say that e is an empty edge. We take to be all the possible 
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encodings (n,s). As 0 < n < and 1 < s < 4, we have |Sj„| = poly{N) = poly{T). As \T,out\ is 
polynomial in |Sj„| (Theorem I3.4p . we also have |Sout| = poly{N) = polyiT). 

The protocol vr is described in Protocol [TJ The description is for Alice side; Bob’s part is 
symmetric. Here we explain more than the pseudocode on how to get E{dA)- Basically cIa is a 
string of symbols in and those symbols are in the form {n,s). E{dA) will be the set of edges 
these symbols in dA represent. To get the edge each symbol (n, s) represents, we will first find the 
edge sent in n-th round and get its proper grandchild according to s. If n is not in the correct 
range then we consider dA as not valid. 


Protocol 1 The protocol vr 

Let T be given by PJP{T). Recall that Alice’s input is X. Assume the parties share some fixed a-edit- 
distance tree code C : —>• 

Initially we set the counter i = 0 . For any leaf node r in T we initialize a counter s(v) = 0. Run the following 
for N times. 

1. i <— i + 1. 

2. Receive a symbol rA[i] from the other party. (For Alice, if z = 1, skip this step) 

3. Find dA G E*^ which minimizes SD{dA,i’AW---i]). 

4. If E{dA) U X has a unique path from the root in T, do the following. Here E{dA) is the set of edges 
indicated by dA, if dA is not a valid string of symbols, E{dA) = 0. 

(a) If this path reaches a leaf node v, then s{v) ^ s{v) + 1. 

(b) Let e be the deepest edge on the the unique path from root. If e G A , and e is either an edge 
in the first or second level of T or e’s grandparent has been sent, set sa[*] to be encoding of e, 
otherwise set sa [z] to be encoding of an empty edge. 

5. If E{dA) U X does not have a unique path from the root in T, set s^[z] to be encoding of an empty 
edge. 

6. If z = A(I — 2p), output the leaf node v with the largest s(z;). 

7. Send C(s^[l...z])[z] to Bob. 


We now analyze Protocol [D and prove it resists up to (1/18 — e)-fraction of edit corruptions. 
Let Na and Nb he the counter i of Alice and Bob respectively, when one of them reaches the 
end of the protocol vr. Let ta = {ti,T 2 ) be the string matching between sb[1--A^s] and r^[l..iVyi] 
that is consistent with the protocol. Let tb = {T 3 ,Ti) be the string matching between SA[l..fV^] 
and tbII'-Nb]- Recall that we use sc(r) to denote the number of *’s in the string. By definition, 
sc(ti) + sc(r 3 ) < 2pN and sc{t 2 ) + sc(r 4 ) < 2pN. 

In the analysis we count the number of rounds in which Alice correctly decodes the entire 
(current) set of edges sent by Bob. We call each such round a good decoding. 

Definition 4.1 (Good Decoding). When a party decodes a message, we say it is a good decoding if 
the decoded messages is exactly the one sent by the other side (i.e., = 'Ss[l..z] or = SA[l-d], 

assuming the other side is at round i), and the symbol just received is not an adversarial insertion. 
If a decoding is not good, we call it a bad decoding. 

In the following lemma we show relate the number of good decodings to the noise, and show 
that as long as noise is small enough, there will be many rounds with good decodings. The proof 
can be found in appendix ICl 

Lemma 4.2. Alice has at least Na + (1 — ^}sc(t2) — (1 + ^)'Sc(ri) good decodings. Bob has at least 
Nb + (1- |)sc(r4) - (1 + ^)sc{t3) good decodings. 
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After we have established that Alice and Bob will have many good decodings, we show that this 
implies they will have good progress in constructing their joint path in the underlying PJP{T). 
Consider the Na + Nb decodings that happen during the protocol, and sort them in a natural 
order; we say that these decodings occur at “times” t = 1,2,..., Na + Nb- Note that the decodings 
need not be alternating — Eve’s insertions and deletions may cause one party to perform several 
consecutive decodings while the other party does not receive any symbol, and performs no decoding. 
We also assume that for each decoding, the sending of the next symbol happens at the same “time” 
as the decoding. Let 6 ^ 4 (t) be the set of edges Alice has sent at time t and eB{t) be the set of edges 
Bob has sent at time t. Let P be the correct path of length T of PJP[T). Define l{t) to be the 
length of the longest path from the root using edges in P n {eA{t) U eB(t)). Basically l{t) measures 
how much progress Alice and Bob have made. Let us also define m{i) to be the first time t such 
that l{t) > i. For notation convenience, let m(0) = 0. 

The following lemma shows that if the parties do not make progress, many bad decodings (and 
thus, many errors) must have occurred. The proof can be found in appendix O 

Lemma 4.3. For i = 0,..., T — 1, if m{i + l) then during time m(i) +1,..., m(z +1) — 1, 

the following is true. 

1. If i is odd, then there are no good decodings of Bob. The number of good decodings of Alice 
is at most the number of bad decodings of Bob. 

2. If i is even, then there are no good decodings of Alice. The number of good decodings of Bob 
is at most the number of bad decodings of Alice. 

Combining the above lemmas, we get the main theorem for protocols with polynomial size 
alphabet. 

Theorem 4.4. For any e > 0, the protocl vr of Protocol[J\ with N = and a (1 — e)-edit tree 

code, solves PJP{T) and is resilient to a (1/18 — e)-fraction of edit corruptions. 

Proof. Set p = ^ — e and a = 1 — e. Let gA be the number of good decodings of Alice, 6 a = Ha —9A 
be the number of bad decodings of Alice. Similarly, let gB be the number of good decodings of Bob, 
and let 6 b = Nb — gt- Recall that sc(ri) + sc(r 3 ) = sc{t 2 ) + sc(r 4 ) < 2pN, then by Lemmawe 
have 

2 2 

6 a + 6 b < -(sc(ri) + sc(r 2 )) + sc(ti) - sc(r 2 ) + -(sc(r 3 ) + sc(t 4 )) + sc(r 3 ) - sc(t 4 ) 
a a 

< 

~ a 

Then we have, 

9A = NA-bA>NA-^>{NA- N{1 - 2p)) + iV(l - 2p) - ^ 

a a 

> bA bB - - - {Ha — H{1 — 2p)) + A^(l — 2p) H——— 

a a 

= bA + 6 b + {Ha — H {1 — 2p)) + iV(l — 16p/a — 2p) 

> 6 a + 6 b + {Ha — A^(l — 2p)) + iV(l — (1 — 18e)(l + 2e:)) 

> 6 a + 6 b + {Ha — A^(l — 2p)) + IGeN > 6 a + 6 b + {Ha — H{1 — 2p)) + T. 
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Similarly we have gs > bA + bs + {Nb — N{1 — 2p)) + T. 

Using Lemma 14.31 we deduce that by time m(T) the number of good decodings Alice may have 
is bounded by T + 65 . From this point and on, every good decoding at Alice’s side adds one vote 
for the correct leaf, making at least gA — {T + bs) > bA + {Na — — 2p)) votes for that node 

by the end of the protocol, and at least + 1 votes until Alice reaches round A^(l — 2 p) when 
she gives her output. On the other hand, any wrong output can get at most bA votes, thus Alice 
outputs the correct leaf node at round A^(l — 2p). By a similar reasoning. Bob also outputs the 
correct leaf node when he reaches round A^(l — 2 p). □ 

5 A coding scheme with a constant alphabet size 

Based on the protocol in Section 01 with some modifications, we obtain a protocol that has a 
constant size alphabet and a constant rate. To this end, we show how to encode each edge using 
varying-length encoding over a constant size alphabet. Although substantially more technically 
involved, this protocol is quite a straightforward extension of the protocol presented above. We 
thus defer the detailed analysis of this protocol to Appendix iDl 

Theorem 5.1. For any e > 0, the simulation tt' of Protocol\^ with N = [^1, and a (1 — e)-edit 
distance tree code, solves PJP{T) and is resilient to a (1/18 — £)-fraction of edit corruptions. 

Since PJP(T) is complete for interactive communication, the above theorem proves Theorem 11.41 
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A More details about the noise model 

Here we give some justifications that explains our modeling choices, and show it is weak enough to 

be a reasonable (i.e., non-trivial) model, yet strong enough to generalize other natural models. 

As explained above, there are two main paradigms for distributed protocols in the asynchronous 

setting. 

1. Message-driven model: In this model, each party wakes up and replies only when received 
a message. If several event occur “simultaneously”, we assume a worst-case scenario in which 
the adversary determines the order of events. We show that allowing arbitrary noise patterns 
make the protocol too strong. Either it halts in the middle, or the noise pattern limits the 
amount of interaction between the two parties. 

(a) If a deletion occurs (without being followed by a insertion, as in our model), then both 
parties clearly get stuck. More generally, it follows that at any point of the protocol, 
the number of insertions must exceed the number of deletions. We now show that this 
restriction still yields a too strong noise model which doesn’t allow any resilient-protocols. 

(b) Assume the adversary is allowed to make up to N/c insertions anywhere during the 
protocol. We show that this limits the protocol to performing c J- 1 interactions be¬ 
tween Alice and Bob (where each “interaction” is sending a message of arbitrary length, 
which is depends on messages received in prior interactions). It then follows that one 
cannot obtain a constant-rate resilient protocol that withstands a constant fraction of 
noise c, as limiting the number of interactions may cause an exponential increase in the 
communication m- 
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Claim A.l. Suppose Alice and Bob each send N symbols in total and the adversary can 
make up to N/c insertion/deletion errors. Then the adversary can make Alice and Bob 
have at most c + 1 interactive rounds (alternations). 

Proof. Without the loss of generality, let’s assume Alice speaks first. The adversary 
makes the following attack: 

i. After Alice sends Bob a symbol, the adversary inserts N/c symbols from Alice to 
Bob. 

ii. Bob receives all the 1 + N/c symbols and replies all of them (recall, each incoming 
message triggers Bob to a single reply). The adversary guarantees that the order 
of event is such that first Bob answers on all messages and only then Alice receives 
them (these events are simultaneous, so Eve gets to decide their internal order). 

hi. Alice receives all the N/c + 1 bits and replies all of them, one by one. Again, Eve 
sets the internal order of events so that first Alice replies all messages before Bob 
start replying. 

iv. Keep doing this until N symbols are sent by both parties. 

It’s easy to see that the protocol has at most c + 1 interactive rounds, where each 
interaction contains N/c symbols which can be dependent on previous blocks. □ 

2. Clock-driven model: In this model, each party has a clock and the party wakes up only 
when its clock ticks. The parties’ clocks are not assumed to be synchronized or correlated in 
any way. We need to carefully describe what happens when the clock ticks is not alternating: 

(a) If a party wakes up and there’s no incoming message, then we take a worst case assump¬ 
tion and allow the adversary to set the message that party sees. 

(b) If a party wakes up, and more than a single message was sent to him during that time, we 
will take a worst-case assumption that the party only sees the last incoming message0 

The following claim shows that the above model can be reduced to the model of edit- 
corruptions we use in this paper. 

Claim A.2. The above clock-driven model can be reduced to a message-driven setting with 
edit corruptions. That is, any protocol in our model that is resilient to sN edit corruptions, 
is also a resilient protocol in the clock-driven setting, which is resilient to up to eN “non¬ 
alternating” clock-ticks. A clock tick is considered not alternating if the previous clock tick 
belongs to the same same party. 

Proof. If there are two clock ticks made by the same party. Without loss of the generality, 
let’s assume they are made by Alice. Denote these two clock ticks as A(l) and A(2), and 
assume that Bob’s clock ticks after A{2). From Bob’s view, he sees only the message Alice 
sends at A(2), so Alice’s first message is deleted. When Alice wakes up at A(2), which follows 
the time A{2) (i.e., there is no other events in between), her incoming message queue is empty, 
so the adversary determines the symbol she sees; this is exactly an insertion. It follows that 
the above non-alternating clock-tick causes exactly the same effect as of a deletion followed 
by an insertion. It is easy to verify that multiple non-alternating ticks have the same affect 
as multiple edit-corruptions. □ 


^We show a positive result for this setting, i.e., a resilient protocol. Thus it is better to take worst-case assumptions, 
as relaxing them may only get better resilience. 
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B Missing proofs of Section [3] 

Proof of Lem,nma \S.8[ By the way we construct Sn based on of Sn-i, and using the fact that for 
\S‘ I 

i < n, > C 2 , we have 

Pr [C{y)[l..j] = x] = • Pr [C{y)[2...j] = x[2..j]] 

U U 

< Pr [C{y)[2...j]= x[2..j]] 

C‘^Sn — \ 


< s ^+^C 2 ^ Pr [C{y)[j\= x[j\] 

CfjSn-i + l 


□ 

Proof of Lemnma \S.1(A We prove the lemma by contradiction. Suppose there exist two messages 
sm,sm' G such that both SD{sm,rm) < § and SD{sm',rm) < We are going 

to show that this results in an a-bad lambda. 

Consider the tree of the edit-distance tree code C. Let D and E be the tree nodes such that 
the path from root to D denotes message sm and the path from root to E denotes message sm'. 
Let B be the common ancestor of D and E in the tree. 

Let r and t' be the string matching chosen in SD{sm,rm) and SD{sm',rm). Let / = |ri| and 
I' = |r{|. Choose i,i' such that del{Ti[i..l]) = BD and del{T[[i'..I']) = BE. Next, consider the 
strings del{T 2 [i..l]) and del{T 2 [i'■■■I']). Both of them are suffixes of rm. Without loss of generality, 
let’s assume |de/(T 2 [L.^])| < \del{T 2 [i 'Therefore del{T 2 [i..l]) is also a suffix of del{T 2 [i'...I']) and 
there exists j < i such that del{T 2 \j..l]) = del{T 2 [i'...I']). Now consider del{Ti[j...l]). Since j < i, 
del{Ti[i...l]) is a suffix of del{Ti\j...l]). 
that AD corresponds to del{Ti[j...l]). 

Now we have 

ED{AD,del{T 2 [j..l])) < 


< 

< 


So there is a node A which is B’s ancestor or B itself such 


Sc{Tl\j..l]) + Sc{T2[j...l]) 
Sc{Tl[j..l]) + Sc{T 2 [j..l]) 
I-j + 1 - Sc{Tl\j..l]) 

SD{sm,rm) ■ \ AD\ 

^■\ADl 


{l-j + 1- Sc{Ti[j..l])) 


Similarly, we have 

rv 

ED{BE,del{T^[i'..l'])) <--\BE 
Since del{T2[j..l]) = del{T 2 [i'-.I']) and by Fact 12.51 we have 


ED{AD,BE) 


< ED{AD,del{T 2 [j..l])) + ED{BE,del{T 2 [i'..l'])) 
a{\AD\ + \BE\) 


< 


< a ■ maxdAZ?!, \BE\). 
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This shows that there’s an a-bad lambda in the tree code and therefore we get a contradiction. □ 


C Missing proofs of Section [4] 


Proof of Lemnma 14-31 We prove the lemma for Alice, the lemma holds for Bob for the same reason. 
Consider the following procedure on the string matching ta = (ti,T 2 ): 


1. Set t = |t 2 |. 

2. While t 7 ^ 0 do 

(a) If T 2 \t] = then mark t as bad and set t •<— t — 1. 

(b) Elseif maxU, ^ fallen 

Let t' = argmax-^;^ as bad, and set t ■(— t' — 1. 

(c) Else mark t as good and set t •<— t — 1. 


First, we claim that the number of bad t’s is bounded by ^('Sc(ti) + sc(r 2 )) + sc(ri). The reason 
is the following: In the above procedure, only step (a) and step (b) can mark some Ts as bad. Each 
time we mark t'..t as bad (even if t' = t, e.g., by step (a)), then we know that > 

So the number of bad t’s among t'..t is bounded by ^{sc{Ti[t'..t]) + sc(r 2 [^^.^])) + sc(ri[t'..t]). 
Therefore, summing over all intervals t'..t that were marked bad, the total amount of bad t’s is 
bounded by ^(sc(ri) + sc(t 2 )) + sc(ri). 

Next, we claim that the number of good decodings is at least the number of good t’s. This 
implies that for Alice, the number of good decodings is Na + sc(r 2 ) — ^(sc(ti) + sc(t2)) — sc(ti) 
since the number of good t’s is equal to the length of ri, Na + sc(t 2 ), minus the number of bad 
t’s, §(sc(ti) + sc(t2)) — sc(ti). The claim holds since each good t corresponds to a time that Alice 
receives some message from Bob and the suffix distance between the message Bob sent and the 
message Alice received is at most max(^^ 1^^ ^ Lemma [3.101 we know at 

that time the message Bob sent is the only one that has suffix distance less than j with respect to 
Alice’s received word. So at that time, Alice decodes correctly. □ 


Proof of Lemnma 4-3. We will only prove the lemma for odd i’s. The lemma for even i’s will 
follow for the same reason. At time m{i), Alice sends the z-th edge on P. Then during times 
m{i) + 1,... ,m{i + 1) — 1, there are no good decodings of Bob. Otherwise Bob would decode 
correctly and send the i + 1-th edge on P. So during times m{i) -|- 1,..., m{i + 1) — 1, all the good 
decodings are Alice’s. Notice that for each good decoding that happens at Alice’s side, at that 
very same time Alice receives a symbol that was sent by Bob (and not a symbol inserted by the 
adversary). Recall that Bob sends a symbol only after he performs a decoding, hence, prior to 
each good decoding of Alice, there must be a decoding at Bob’s side. However, there are no good 
decodings of Bob during this time period. Also at time m{i), the decoding is a decoding of Alice. 
So in the time period m{i) -|- 1, + 1) — 1, the number of good decodings of Alice is at most 

the number of bad decodings of Bob. □ 


18 










D Detailed protocol and analysis for Theorem 15.1 

In this section we give a protocol vr' that solves the PJP{T) task, takes N = Oe{T) rounds, and is 
resilient to (1/18 — e)-fraction of edit corruptions. In contrast to Protocol [T] presented above, here 
tt' communicates symbols form a constant-size alphabet, and thus has communication complexity 
of CC{'k') = Os{T), that is, it has constant rate. By this we complete the proof of Theorem 11.41 

The Outline. Generally speaking, n' (Protocol [2|) follows the same high-level ideas as Proto¬ 
col [H except for encoding each edge e transmitted in tt via a varying-length binary encoding. More 
specifically, each edge e = (n, s) € [T] x [4] is encoded as the binary description of {i — n, s), where 
i is the current round number at the sender’s side. The values n and s have the same meaning as 
in Protocol [U n is a pointer to a previous round number in which the sender started to send an 
edge e', and s indicates that the new edge e is the s-th grandchild of the edge e'. An empty edge is 
encoded as (0,0), that is we assume s take values in {0} U [4]. Hence, the above binary description 
has length at most log(i — n) + A bits. It can be shown that the amortized encoding length, is 0(1) 
and the proof is similar to the proof in [7]. 

Several difficulties arise due the above varying-length encoding. First, since the channel’s al¬ 
phabet has a fixed size, it may take several rounds of communication to transmit a single edge. 
Furthermore, during these rounds where e is being transmitted, new symbols are received from the 
other side. These symbols may cause the party to understand it needs to send a different edge e, 
instead of e that is still in the process of being communicated. 

In the case explained above, the party will just add e to a list of edges to be communicated. 
Recall that (some of) these edges may be added to the list due to adversarial noise. Moreover, 
such “wrong” edges may have a very large encoding (e.g., when the adversary causes a party to 
believe it needs to resend the first edge at the middle of the protocol; this has an encoding of length 
log(A^)). To prevent the adversary from delaying the progress of the protocol by these effects, we 
make the party send all the edges in the list in parallel. That is, we cycle through the list of edges 
and send one bit from the encoding of any edge of the list. That way, long encodings do not delay 
the transmission of other edges. Furthermore, we attach to each edge a “liveliness” counter which 
indicates how likely it is for a specific edge to be part of the correct path of the underlying PJP{T): 
each time a new symbol is received and the decoding of the incoming message implies some edge e 
should be sent, we increase the liveliness of that edge. This way, if an incorrect edge with a very 
long binary encdoing is added to the list, the noise must keep indicating this edge is needed in 
order to keep it “alive” in the list. 

The Fine Details. We assume alphabet size of |Sj„| = 0(l/e^). Each party maintains a ta¬ 
ble (the EdgeTable), that stores all the edges this party is currently communicating. As mentioned 
above, the party cycles through the EdgeTable and sends one bit of every edge there. Thus, if 
the table holds < 1/e^ edges, a single round of communication suffices to send one bit of all the 
edges. Otherwise, several rounds of communication may be needed. Assume that the table holds 
E edges, we name the process of sending each bit of all the E edges, a cycle. Note that a cycle 
takes E/iX/e^) rounds of communication; each such round is called a page, and we say that a page 
is full if it contains 1/e^ edges. Eor simplicity, we assume that new edges are added to the table 
only when a cycle is completed. Similarly, edges that were fully communicated are removed from 
the table at the end of a cycle. 

The EdgeTable contains the following fields: 
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Edge: contains the edge e in the underlying PJP{T) tree T. 


• Edge Binary Description: contains the varying-length binary description of e, as described 
above. 

• Current Sent Length: means how many bits of the binary description were already com¬ 
municated. 

• Live Points: A number describing the liveliness of the edge. When e is added to the table it 
gets 1/e live points. At any future round where e is to be added to the table, if it is already 
in the table it’s live points increase by 1/e. Every round we communicate a bit of e its live 
points decrease by 1. If the live points of some edge reach 0 at the end of a cycle, the edge is 
removed from the table. 


Let us now formally define the process of a cycle: transmitting one bit from each edge in 
EdgeTable. In fact, along with one bit per edge, we also send some meta-data, which is described 
in Procedure [U below. Each symbol in the our encoding needs to hold 4 bits for up to 1/e^ 
edges, thus it is clear that = 0(l/e^) suffices. We note that the information sent at each cycle 
suffices to exactly recover the EdgeTable table at the other side. 


Procedure 1 Cycle(EdgeTable) 

repeat [A/(l/£^)] times {E is the number of edges in EdgeTable): 

The next symbol to communicate consists of the following information: 

1. One bit to indicate whether this page is the last page in the cycle or not. 

2. For each e of the next 1/e^ edges in E include: 

(a) the next bit of e’s “Edge Binary Description” to be transmitted 

(b) One bit to indicate whether e has “Live Points” = 0. 

(c) One bit to indicate whether e has “Current Sent Length” = the length of its “Edge Binary 
Description”. 

(d) One bit to indicate whether e was added to the table at the end of the last cycle. 


We now describe the UpdateTable(e) procedure. This procedure is called after every cycle, adds 
the edge e to the cycle, and removes edges that either were fully transmitted or their “Live points” 
has reached 0. The parties add edges to their EdgeTable using the update process UpdateTable(e) 
described in Procedure [2j 


Procedure 2 UpdateTable(e) 

1. Add e: 

(a) If e is not an empty edge and e is not in the table, then insert e to the table. Compute the 
“Edge Binary Description”. Set “Current Sent Length” = 0 and “Live Points” = A 

(b) If e is not an empty edge and e is in the table, increase its “Live Points” by A 

2. Maintain liveliness, and remove dead edges: 

(a) Decrease “Live Points” of each edge in the table by 1. Increase “Current Sent Length” of each 
edge in the table by 1. 

(b) Remove all the edges with “Live Points” = 0 or “Current Sent Length” = the length of its “Edge 
Binary Description”. 
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We are now ready to describe the Protocol vr', given in Protocol [2j Similar as in Section 01 we 
say that cIa is valid if it is the encoding of a set of edges. 

Protocol 2 The protocol vr' 

Let T be given by PJP{T). Recall that Alice’s input is X. Assume the parties share some fixed a-edit- 
distance tree code C : —)• 

Initialize i = 0. 

Repeat for N = \T/e‘^~\ times: 

1. i <— i + 1. 

2. Receive a symbol from the other party. (For Alice, if i = 1, skip this step) 

3. Find dA G E*„ which minimizes SD{dA,rA[l---i]). 

4. If E{dA) U X has a unique path from the root in PJP{T), do the following. Here E{dA) is a set of 
edges indicated by if dA is not a valid message, E{dA) = 0- 

(a) If this path already reach the leaf node v, then s{v) ■(— s{v) + 1. 

(b) Let e be the deepest edge on the the unique path from root. If e ^ A , or e is not an edge in the 
first or second level of T or e’s grandparent has been sent, set e to be an empty edge. 

5. If E{dA) U X does not have a unique path from the root in PJP{T), set e to be an empty edge. 

6. If a cycle is completed (or if i = 1), call UpdateTable(e). 

7. sa[*] ^ the next page of Cycle(EdgeTable). 

8. Send C(sA[l...i])[i] to Bob. 

9. If j = A(1 — 2p), output the leaf node v with the largest s(u). 


We now analyze the coding scheme of Protocol [2l and prove our main theorem (Theorem 11.41 
obtained via Theorem 15.ip . First, we claim that Protocol [2] has a constant rate. Indeed, it com¬ 
municates 2AIlog bits throughout. We use a tree code with |Sj„| = Oe(l) and a = 1 — e and 

thus by Theorem 13.41 we get that |Soni| = Oe^l) as well. Hence, the total communication of tt is 
given by 

CC{7r') = 2iVlog = T/e^ ■ (l/e^)®!!/-) = o,{T). 

The correctness analysis is quite similar to the one of Section 01 As above, let Na and Nb he 
the counter i of Alice and Bob when one of them reaches the end of the protocol. Let ta = (ti, T 2 ) 
be the string matching between sb[1..A^s] and rA[i---NA] that is consistent with the protocol. Let 
tb = (t3,T4) be the string matching between sa[^--Na] and tbII.-Nb]- Recall that we use sc{t) to 
denote the number of *’s in the string. By definition, sc(ri) -t- sc(t 3 ) < 2pN and sc(r 2 ) + sc{t 4 ) < 
2pN. 

Let’s first prove the following lemma which shows that the case where the number of edges in 
the table exceeds is very rare. 

Lemma D.l. Alice sends at most e ■ Na full pages, and Bob sends at most e ■ Nb full pages (recall 
that a full page is a page that contains -p edges). 

Proof. We prove the lemma for Alice, and the same holds for Bob. Notice that the sum of “Live 
Points” of all edges in the table is always non-negative. And in each round of Alice, if the Up- 
dateTable procedure is called, it increases the sum of “Live Points” by at most p Whenever Alice 
sends a page of p edges, the sum of “Live Points” is decreased by at least p. Alice has Na rounds 
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in total. Let pA be the number of full pages sent by Alice. By a simple counting argument, we have 

£ £^ 

Therefore pa ^ ' ^A- D 

We again define good decoding, which is slightly different from the one of Section 0]— we only 
care about decodings that happen at the end of a cycle. 

Definition D.2 (Good Decoding). When a party decodes a message, we say that it is a good 
decoding if the followings hold: 

1. The decoding outputs the entire message sent so far by the other side or 

ds = SA[l-d]) 

2. The symbol just received was not inserted by the adversary. 

3. The party has finished a cycles (i.e., UpdateEdge(e) is called at that round.) 

Otherwise we call it a bad decoding. 

Lemma D.3. Alice has at least Na{1 — e ) + (1 — ^)sc{t2) — (1 + ^)sc{ti) good deeodings. Boh has 
at least Nb{1 — e) + (1 — ^)sc[Ti) — (1 + ^)sc{ts) good deeodings. 

Proof. We prove the lemma for Alice; the lemma for Bob holds for the same reason. By Lemma f4.2l 
the number of decodings of Alice which satisfy the first two constraints in Definition ID. 21 is at 
least Na + (1 — ^)sc{t2) — (1 + ^)sc{ti). By Lemma iD.ll the number of decodings of Alice 
which don’t satisfy the third constraint in Definition ID. 21 is at most e ■ Na. So Alice has at least 
Na + {1- ^)sc{t2) - (1 + ^)sc(ri) -£■ Na good decodings. □ 

As in the analysis of Protocol [H we now sort all the good and bad decodings of both Alice and 
Bob by the time these decodings happen. Since in total we have Na + Nb decodings, we can assume 
these decodings happen at times f = 1, 2,..., Na + Nb. Again, we assume that for each decoding, 
the sending procedure right after it happens at the same time as the decoding. Let CAit) be the 
set of edges Alice has fully communicated up to time t and eB(t) be the set of edges Bob has fully 
communicated up to t. Let P be the correct path of length T defined by the underlying PJP{T). 
Let l{t) be the length of the longest path starting from root using edges from P n {eA{t) U es(t)). 
Basically l{t) measures how much progress Alice and Bob have made. Let’s also define m{i) to be 
the first time t such that l{t) > i. 

The following lemma is similar to Lemma 14.31 and shows that many bad decodings are needed 
to slow down the progress. For notation convenience, we need the following definition. 

Definition D.4. Let gA[ti,t 2 ] and gB[ti,t 2 ] be the number of good decodings of Alice and the 
number of good decodings of Bob during times ti, ...,t 2 . Let bA[ti,t 2 ] and bB[ti,t 2 ] be the number 
of bad decodings of Alice and the number of bad decodings of Bob during times ti, ...,t 2 . 

Lemma D.5. For i = 0, ...,T — 1, let /j+i be the length of the binary description of the {i + 1)- 
th edge on P during the first transmission where this edge is fully communicated. During time 
m{i) + 1, ...,m{i + 1), the following is true. 


22 








1. If i is odd, then 


gB[m{i) + l,m{i + 1)] < k+i +e • bB[m{i) + l,m(z + 1)], and 
gA[m{i) + 1, m{i + 1)] < k+i + (1 + e) • fes[m(i) + 1, m{i + 1)]. 


2. If i is even, then 


gA[m{i) + 1, m{i + 1)] < k+i + e • bA[m{i) + 1, m{i + 1)], and 
gB[m{i) + l,m{i + 1)] < k+i + (1 + e) • bA[m{i) + 1, m{i + 1)]. 


Proof. We will only prove the lemma for odd i’s] the case for even i’s follow for the same reason. 
First, for any time period ti, ...,t 2 , it holds that gA[ti,t 2 ] < + bB[ti,t 2 ]. This follows since 

every good decoding of Alice stems from a symbol that was actually sent by Bob (rather than 
inserted by Eve), which implies a decoding at Bob’s side (either bad or good). 

Since we assume i is odd, ,at time m{i), Alice finishes sending the i-ih. edge on path P. From 
the above claim we have gA[ni{i) + l,m{i + 1)] < ( 7 B[m(i) + l,m(z + 1)] + 6s[m(z) + l,m{i + 1)]. 
Then to prove Lemma [0.51 it suffices to show that 5 s[m(i) + l,m{i + 1)] < /j+i + e • bB{m{i) + 
l,m(i + l)] because the upper bound on gB[m{i)+ l,m{i + l)\ will directly give us the upper bound 
on 5u[w.(i) + l,m{i + 1)]. 

Let’s divide time period m{i) + l,...,m{i + 1) into three different types of time periods and 
bound the number of good decodings of Bob. Let ej+i be the (i + l)-th edge of P. 

1. Intervals [ti,t 2 ] in which Cj+i is not in Bob’s EdgeTable. In this interval gB\ti,t 2 ] = 0 because 
if Bob has a good decoding, he adds the {i + l)-th edge of P into his EdgesTable. 

2. Intervals [ti,t 2 ] where Cj+i was inserted into the Bob’s EdgeTable at ti, but removed at t 2 
before Bob has finished to fully send its encoding. In this time period, the “Live Points” of 
ej+i is increased for at least gB[ti,t 2 ] times, and each time it is increased by i. At the same 
time the “Live Points” of ej+i decreases by 1 for at most 6 s[ti,t 2 ] times, until they reach 0 
at time t 2 . We thus have gB[ti,t 2 ] < e • bB[ti,t 2 ]. 

3. Interval [ti,t 2 ] where ej+i was inserted into the Bob’s EdgeTable at ti, and finally this edge’s 
sending procedure is finished at time t 2 . It is clear that t 2 = m{i + 1), and g_B[ti,t 2 ] < h+i 
since assuming e^+i wasn’t removed from the table, after /j+i times of sending all the pages 
in the table. Bob will have finished sending the encoding of e^+i. Recall that a good decoding 
implies Bob completed sending the last page in his table. 

To sum up, we know that in case 1 and 2 , we have gB[ti-,t 2 \ < e • ^ 2 ]- And case 3 will happen 

at most once. Therefore we have 5 B[m(i) + l,m{i + 1)] < /j+i + e ■ bB[m{i) + l,m(i + 1)] which 
completes the proof. □ 

In the following lemma, we bound the sum of the length of encodings of P’s edges, regardless 
of the progress of the protocol. 

Lemma D. 6 . Given any instance of ir', that succeeds to fully communicate the first T' <T edged 
in P, for i = 1, let li be the length of the binary encoding of i-th edge on P as communicated 

in that instance. For i > T', define k = 0. Then, 
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Proof. For any edge li let (<5*, Sj) be the encoding that was used in that instance of tt'. By definition, 
5i is the difference between the round n* where we start sending the encoding of li and round ni- 2 , 
where the first bit of /j _2 was communicated. Clearly, for any i > 2 we have li < log(nj — ni- 2 ) + 4, 
while for the first two edges i = 1,2 we have li < log(ni — 0) + 4. 

Then we have 

m + ^ (uj - ni- 2 ) < Na and n 2 + ^ (n^ - ni- 2 ) < Nb, 

3 <i<T', 4<i<r', 

i odd i even 

and by the concavity of logarithm, we have 

<log f^(ni+n 2 + + 4 < log +4. 

i=l \ 3 <i<T ) ^ ^ 

Finally, note that xlog(iV/x) is monotonic increasing in x G (0,V/e), thus the claim holds for 
any T as long as T' <T < 2Nje. □ 

With the above lemmas we can complete the proof of the main theorem. 

Theorem D.7 (Restate of Theorem 15.ip . For any e > 0, the simulation tt' of Protocol\E with 
N = \^~\, and a (1 — e)-edit distance tree code, solves PJP{T) and is resilient to a (1/18 — e)- 
fraction of edit corruptions. 

Proof. For notation convenience, let qa = ffyifl, d^A+Ns], 9 b = 9b{^-, ddA+ddB]-, bA = feyl[l, dlA+dlB] 
and bs = 6s[l, A /4 + Nb]. Also let p = 1/18 — e and a = 1 — e. The proof is quite similar to the 
one of Theorem 14.41 with some small difference in the parameters, specifically, we show that both 
Alice and Bob have a large number of good decodings (compared to their bad decodings) which 
implies they both output the correct value. 

Claim D. 8 . 

T 

9A — (^^ li) P (A^A “ (1 “ ‘dp)N) — (1 + £){bA + bB). 

i=l 

Proof. Recall that both Na and Nb are in the range [N{1 — 2p),N] because there are at most 2pN 
insertion/deletion errors. By Lemma lD.31 we have 


bA + bB 

< Na — gA + Nb — qb 

< eNa + f (sc(ri) + sc{t2)) + sc{ti) - sc{t2) + eNb + ^{sc{t^) + sc(r4)) + sc(r3) - sc{T/f) 
8 pN 

^ ■ + 2eN 


< 


a 
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Using Lemma [EB] to bound Ei h, and using the fact that p + e < and that N > T/e^, we have 

T 

gA-{l + e){hA + hs) - ( k) - {Na - (1 - 2p)N) 

i=l 

>NA-bA-{l + e){bA + be) - T(log( + 4) - (Na - (1 - 2p)N) 

-NA -/Vr 

>Na-{ 2 + e){bA + be) - r ( log ( + 4) - [Na - (1 - 2p)N) 

> (1 - 2p)N - (2 + e)(^ + 2eN) - T(log(^) + 4) 

a 1 

>iiEjv-r(iog(^) + 4) 

>r(L_4-iog(4)) 

> 0 


□ 


Similarly, we have qb- {1 + e){bA + bs ) - {Ya=i h) - {Nb - (1 - 2p)N) > 0. 

With the above claim and Lemma lD.51 it follows that both Alice and Bob will finish sending 
all the edges on P. Also by Lemma lD.51 we have 


T 

5a[ 1, m(T)] < e • 6 a[ 1, ni(T)] + (1 + e)bB[l, m{T)] + k). 

i=l 


Therefore, we have 

gA[m{T) + 1, A^a + Nb] = gA- gA[i-,m{T)] > 6a + {Na - (1 - 2p)N). 

As Alice could have at most {Na — (1 — 2p)N) good decodings after she gives her output, the 
number of good decodings in the period between m{T) and the point where Alice gives an output, 
is larger than 6a, suggesting the correct leaf will get at least that many votes. On the other hand, 
any bad leaf will get at most 6a votes, thus Alice outputs the correct leaf. A similar reasoning 
applies for Bob. □ 

E Potent edit-distance tree code 

In this section we show that it is simple to construct a relaxed notion of EDTC which is very 
useful for most applications, namely potent edit-distance tree codes (PEDTC). This follows by 
extending the techniques of Gelles, Moitra and Sahai for constructing a relaxed notion of standard 
tree codes [E], to the edit-distance case. 

Definition E.l. (Bad Interval) For a given d-ary prefix code of depth n let (A,B,D,E) be an a- 
bad lambda (Definition 13.2p . Let h be the depth of A and I = max(|AL>|, |AE|). We say that the 
interval \h,h + i] is a-bad. 
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Definition E.2 ((<5, a)-PEDTC). An (5, a)-bad tree is a prefix code of depth n that has a path Q 
for which the union of all bad intervals along Q (i.e., all bad intervals for which either the point E 
or D of the bad-alpha inducing that bad interval is a node on Q) has a total length of at least 5n. 

If the tree is not an (5, a)-bad tree, then we will call it a (<5, a)-potent edit-distance tree-code, 
or ((5, a)-PEDTC for short. 

In other words, in a PEDTC, for any given root-to-leaf path Q in the tree, if we sum up all the 
nodes that belong to some bad lambda, their number would be less than 5n. Next we show that a 
tree whose labels are randomly chosen is a PEDTC with high probability. 

Proposition E.3. For any constants 6,a & (0; 1) for any d > 2, and for any > 

8 

(32d^) , a d-ary tree whose labels are independently and uniformly selected from Eout is a 

(6,a)-PEDTC with probability > 1 — 2“"-. 

Proof. Assume we construct a tree by randomly picking each label uniformly from Eout- Assume 
the obtained tree code is not (5, a)-potent. Then, it means there exist a path Q and a set of nodes 
{xi,... ,Xrf} along it, so that for each Xi there exists points Xx^ = {A,B,D,E)xi where Xi = E or 
Xi = D such that A^,. is a a-bad lambda that induces the interval li, and that | IJi ^*1 — 

The following technical lemma says that there exists a subset of {xi,..., x^} whose nodes induce 
bad intervals of length at least 6n/2 and the intervals are disjoint. Also, note that there are at 
most 2” X 2” = 4F ways to distribute these disjoint intervals along the path Q. This is because, in 
order to figure out the configuration of all intervals, we only have to determine which nodes are in 
some interval and which nodes are ends of some interval. 

Lemma E.4 ([28]). Let ■ ■ ■ Ak be intervals on N, and their union has length X. Then there 

exists a set of indices I C {1, 2,..., A:} such that the intervals indexed by I are disjoint, and their 

total length is at least X/2. That is, for any i,j G I, \li F\ij\ = 0, and — ^/‘^■ 

A proof is given in [28]. 

Now let’s first consider the probability that some interval h is bad when the path Q is given. 
Let’s first see which lambda {A,B,D,E) can make the interval bad. We count these lambdas 
by figuring out points A,B,D,E one by one. As Q and h are given, the point A is fixed. Next 
let’s figure out the positions of D and E. We know that one of D and E should be on Q and 
maxdAEj, |AD|) = \li\. It follows that there are at most 2|d| • ways to pick E and D\ there 

are < \li\ ways to pick the point which lies on Q, and < ways to pick the point which is not 

on Q] another factor of 2 is for choosing whether D or E is the one that lies on Q. Also note that 

each such choice determines the position of B along Q. So in total, there are at most 2|/j| • 
lambdas that could induce the bad interval h on Q. 

Given any lambda A^,. Lemma 12.71 bounds that the probability that A^,. is a-bad by 

(l-a)max(|Ag|,|Bg|) (l-c»)Uil 

So the probability that k on Q is a-bad is at most 2\li\d'^‘'~^^ ■ {Eoutl * • 

Using Lemma IE.41 above, there exists a set I so that {Xxffi^i induce disjoint intervals with 
a total length of at least Vi\ ^ dn/2. Since the intervals are disjoint, their probabilities to 

occur are independent, and the probability that a specific pattern of intervals happens is bounded 
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by their product. By setting |5]o„t| > 
{S, a)-bad by 

Pr[ PEDTC is (5, a)-bad 


(32(i^) (i-“)<5 we can bound the probability that a PEDTC is 

k 

i<E E n Pr[li is a bad interval] 

Q ®=i 

Eii \k\>Sn/2 

Q ? • • • 1 

Etl \k\>Sn/2 

o (l — ct)5n 

< • 4 *" • 2 " • 2 " • 

< (led^)^ .(32d3)-« 

= 2“”. □ 


Similar to mm we can further derandomize the above construction, by making the random 
choice of each label coming out of a small-biased sample space (e.g., [2]). This immediately leads 
to an efficient (randomized) construction of PEDTC with succinct description that succeeds with 
overwhelming probability. We omit the details and refer the reader to [HIES]. 


F A coding scheme with a constant alphabet size and PEDTC 

In this section, we show that our scheme work the same when the edit-distance tree code is replaced 
with a potent edit-distance tree. Specifically, we prove that Protocol [2] still solves the pointer 
jumping problem given a PEDTC. 

First, we show an analogue of Lemma 13.101 used for decoding the tree, to the case of potent 
trees. 

Lemma F.l. Let C: —>■ be a {6,a)-PEDTC, and let rm G (m can he different 

from n). If there exist sm € U)E;^C'(S^)[l..i] such that SD{sm,rm) < ^ and the end node of sm 
on the tree is not in any bad interval of any path Q that contains sm, then sm will be the unique 
message G U(L;^C'(E^)[l..i] such that SD{sm,rm) < 

Proof. We prove this lemma by contradiction. Suppose there exist two messages sm, sm' G 
U(E;^C'(^”j)[l..i] such that both SD{sm,rm) < ^ and SD{sm',rm) < and also the end node of 
sm on the tree is not in any bad interval of any path Q that contains sm. By the same argument 
in Lemma [3.10l we know there is an a-bad lambda in C. Let this a-bad lambda to be {A, B, D, E), 
then either D or E would be the end point of sm. Therefore, the end point of sm would be inside 
some bad interval of some path Q. Now we get a contradiction. □ 

We can now prove the main theorem of this section, namely, the existence of a coding scheme 
that assumes PEDTC. 

Theorem F.2. For any e > 0, the simulation tt' of Protocol\M with N = j"^], and an (e, 1 — e)- 
PEDTC instead of an edit distance tree code, solves PJP{T) and is resilient to a {1/18 —e)-fraction 
of edit corruptions. 
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Proof. The proof is very similar to the proof of Theorem l5.ll We are going to use exactly the same 
notations like Definition ID. 21 and Definition ID. 41 And it is easy to check that Lemma ID. II Lemma 
ID. 51 and Lemma ID. 61 still hold for Protocol [2] with PEDTC. The only difference is that we need to 
make a slight change in Lemma ID. 81 We prove the following lemma as the analogy of Lemma ID. 81 

Lemma F.3. Alice has at least Na{1 — e) + (1 — §)sc(r 2 ) — (1 + §)sc(ri) — eNb good deeodings. 
Bob has at least Nb{1 — e) + (1 — ^)sc(t4) — (1 + ^)sc{t^) — eNa good decodings. 

Proof. We prove the lemma for Alice; the lemma for Bob holds for the same reason. Compare 
this lemma with Lemma ID. 81 it suffices to show that the number of new bad decodings of Alice 
caused by changing edit distance tree code to PEDTC is at most eNb. For an instance of running 
Protocol [2] with PEDTC, let Q be the path of SB[l..A^B](all the symbols sent by Bob) on the tree of 
PEDTC C. By comparing Lemma 18.101 and Lemma IE.II we know that the corresponding sending 
message of each new bad decoding has end point in some bad interval of Q. We also know that 
each new bad decoding satisfies the constraint that the symbol just received in this decoding was 
not inserted by the adversary. So the corresponding sending messages of new bad decodings have 
different end points on the tree of PEDTC. By the definition of (e, 1 — e)-PEDTC, we know that 
there are at most eNb nodes on Q which are in bad intervals. Therefore, the number of new bad 
decodings is at most eNb. □ 

Next we are going to prove an analogue of Claim ID?^ for Protocol [2] with PEDTC. 

Claim F.4. 

T 

gA - ^ - (1 + e)(&A + bB) 

i=l 

Proof. By Lemma lE.81 we have 


bA + bB 


< Na — gA + - gs 

< 2 eNa + ^{sc{ti) + sc(r2)) + sc(ri) - 
2 eNb + |(sc(t 3) + sc(r4)) + sc(r3) 
8pN 


< 


+ 4eiV 


Sc(t2) + 
- Sc{t4) 


Using Lemma rP.hl to bound k, and using the fact that p + E < and that N > T/e"^, we have 


gA-il + e){bA + bB) - {Y. 

i=l 

>NA-bA-{l + e){bA + bB) - T(log( + 4) - (Na - (1 - 2p)N) 

>Na-{2 + E)ibA + bB) - r(log( ^^ + 4) - {Na - (1 - 2p)N) 

> (1 - 2p)N - (2 + e)(^ + 4eN) - T(log(^) + 4) 

a I 

>6eiV-r(log(^)+4) 
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□ 


> 0 


After proving Claim IF^ in order to finish the proof, we just need to exactly follow the argument 
after Claim ID^ in Theorem 15.11 □ 


G Upper bound on the noise fraction 


In this section we show that no protocol can tolerate edit-corruption noise fraction of p = 1/6 
or more, assuming that the parties are required to be correct after receiving (1 — 2p)N symbols. 
Intuitively, since the effective protocol is shorter, the noise bound of 1/4 (that stems from the 
ability to perform substitutions) becomes higher. Indeed, 1/4- (1 — 2p)N corruptions out of a total 
communication of 2N amounts to a fraction of errors of 


P = 


1/4- {l-2p)N 

m 


p = 


1 

6 ' 


We remind that the requirement to give an output comes from Eve’s possibility to delete 2pN 
at one party, making this party receive at most (1 — 2p)N symbols throughout the protocol. Below 
we give a formal attack that demonstrate the bound of 1/6, under this stringent requirement. 


Theorem G.l. Let tt be an interactive protocol for PJP{T) that is resilient to a p-fraction of edit 
corruptions, and assume |7r| = N. If both parties are required to output the correct output at round 
(1 — 2p)N, and assuming an adversarial edit-corruption rate of p = IjQ, then no protocol outputs 
the correct output with probability > 1/2. 


Proof. Assume the parties run an instance of vr on inputs (x, y) = (0, 0). Eve performs the following 
attack (wlog, attacking Alice): Eor the hrst N/3 symbols sent by Alice, Eve deletes those symbols 
and inserts back to Alice symbols that simulate Bob on input y = 1. Then (at round N/3 and 
beyond), Eve does nothing. 

Recall that Alice must output the correct value after receiving (1 — 1/3)A^ = 2N/3 symbols. 
However her view at this time is indistinguishable from an instance of vr on inputs (x, y) = (0,1) 
in which Eve corrupts Alice rounds [A^/3, 2A^/3] by deleting the symbols Alice is sending in these 
rounds, and inserting Bob’s symbols assuming y = 0, and assuming the A^/3-th symbol sent by 
Alice is the first received by Bob. This implies that Alice cannot determine Bob’s input at rounco 
2N/3 under this attack, which proves the claim. □ 


If we don’t require the the parties to output the correct value already at round (1 — 2p)N, then 
it is possible to show an upper bound of p = 1/4 on the fraction of noise, similar to the case of 
interactive protocols over standard noisy channels [7]. 


^We mention that it may be possible for Alice to output the correct answer by round N. 
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Figure 2: An illustration of lambda structure 



Figure 3: An illustration of the new tree C for defining the potential probability Pi{C) 
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